-
Couldn't load subscription status.
- Fork 700
[Executorch][llm] Fix ring kv cache when used with quantized kv cache and sdpa #12143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ssful At the moment we continue execution and the stack fails later on as I found when running with quantize kv cache + ring attention Differential Revision: [D77516822](https://our.internmc.facebook.com/intern/diff/D77516822/) ghstack-source-id: 293635304 Pull Request resolved: #12129
Now that we support quantized sdpa query tensor can be quantized and attention mask can be float (the only type allowed). So this check doesnt make sense anymore. Differential Revision: [D77516821](https://our.internmc.facebook.com/intern/diff/D77516821/) ghstack-source-id: 293661338 Pull Request resolved: #12131
… and sdpa When using quantized kv cache and SDPA, there was two bugs: 1. It did not reset return_float_values of QuantizedRingKVCache. Which results in QuantizedKVCache returning float values post dequant. 2. For quantized kv cache, SDPA module stores kv_cache that is owned by attention module. When replacing kv cache in Attention we have to make sure that we change the reference in SDPA as well. Differential Revision: [D77516823](https://our.internmc.facebook.com/intern/diff/D77516823/) ghstack-source-id: 293661340 Pull Request resolved: #12132
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12143
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 2 Unrelated FailuresAs of commit 2530b33 with merge base 9905026 ( NEW FAILURE - The following job has failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
… and sdpa (pytorch#12143) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: pytorch#12132 by @kimishpatel ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/head Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/195/orig Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/orig @diff-train-skip-merge --------- Co-authored-by: Kimish Patel <[email protected]>
This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #12132 by @kimishpatel
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/195/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/orig
@diff-train-skip-merge